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ABSmCT 


A stratification oriented to crop area and 
yield csi.imation problems was perfoiMHod using an 
algorithm of clustering. The variables used 
wore a set of agroclimatologicnl characteristics 
measured in each one of tlic 232 municipalities 
of tlic State of Rio Grande do Sul, Brazil. A 
nonhiorarchical cluster analysis was used and 
the pseudo P-statistics criterion was implemented 
for determining the "cut point" in the number 
of strata. 


1 . INTRODUCTION 

In order to predict the crop production of a region it is necessary to 
estimate two parameters; CA and P, crop area and yield, respectively, and 
integrate them by the expression 

TP e CA * P • (1) 

where TP indicates total production. 

The estimation of CA and P depends on a data sot wliich can be obtained by 
different means: statistical sampling system or a census data collecting system 
in the area of interest. Aerial photograph and/or LANDSAT imagery arc important ' 

means for CA estimation; for P, a yield prediction model can be implemented, ^ 

see Baior (1979) and Cappelletti et al. (1981). 

Both of the above approaches requires the stratification of the area to be 
studied for producing an adequate confidence coefficient in the final 
estimates (Raj, 1968). 

• This paper reports results obtained in the construction of strata with the 
application of an algorithm of Cluster Analysis to a set of data consisting of 
agroclimatologicnl variables, whose values are, in general, averages values of ■ 
historical series. 

The data refers to eoch'one of the 232 municipalities of the State of Rio 
(grande do Sul, Brazil. The decision to consider as a unity such political 
division was because the data were published in a municipality level.' 

The building of homogeneous strata is the first stop in a project of crop 
production estimation. Since the parameters AC ■.'ind P in Equation (1) depends 
on different characteristics, two set of strata are required. 

• u ^ 


*Presentcd at the Sixteenth International Symposium on Remote Sensing of 
linvironmont, Buenos Aires, Argentina, June 2-9, 1982. 
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2. MBfHODOLOGY 

Tho technique of clustering has boon widely used for grouping similar 
units. This technique works with a data matrix and a similarity measure. 

The data matrix has dimension N * P, being N tho number of units and P the 
number of characteristics observed or calculated for each unit. 

Tho similarity measure used in this paper was the Euclidean distance in 
the observation space. 

The objective of the clustering algorithm is to minimize tho intraclustor 
sum of squares following the K-means procedure of Mac Queen (1967). 

The algorithm works as follows: the i‘'i' unit of the variable has 

value x(i,j), i = j n 1 p, and each of tho N unities lies in just 

one of K cluster. Denoting the mean of tho jU' variable over tho unities by 
5c(t»j)i the distance between tho iU' unity and tho sL*' cluster is: 

D(i,s) “ r I (x(i,j) - x(s,j))> 1 

b jol J . _ 

and tho error partition is 

E [Pjj(m,k)] = D> (i,s(i)) 

where s(i) is the cluster containing the iL^ unity and P„ indicates a partition 

a. 

The procedure searchs for a partition ;with small E by moving unities from 
One cluster to another and ends when no such movement reduces E (Hartigan, 
1975). 

Clusters were generated in a nonhicrarchical process. The process began 
with one group and stopped when the number of groups reached the cut point 
given by the pseudo F-statistics (PFS) criterion function of Vogel and 
Wong (1979). 

The PFS criterion provided the optimal number of groups operating with 
the weighted ratio of the traces of the matrices B and W, respectively the 
matrices of sums of squares between and within groups in the multivariate 
analysis of variance (MANOVA) of the data set. 

3. VARIABLES AND DATA SET. 

» • 

' . I • 

The variables used were: 

wheat cultivated area (Ha) 
wheat yield (kg/Ha) 
average farm size (Ha) 

agricultural land adequated for wheat (Ha)-, 
average temperature (°C) 
potential evapotranspiration (mm) 
rainfall in normal years (mm) 
rainfall in dry years (mm) 
average yearly run-off (i) 

useful fraction of rainfall in normal years (mm) 
useful fraction of rainfall in dry years (mm) 
internal drainage in normal years (inm) ‘ 

internal drainage in dry years (mm) 
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moisture* deficit in normal years (imn) 
moisture deficit in dry years (mm) 
necessary minimum rainfall (nun) 

The data set was taken from the Anuario Estatistico do Rio Grande do Sul 
(1968, 1969, 1970 and T976) and from Ministerio da Agricultura (1976). 

4. RESULTS AND DISCUSSION 
4.1 WHEAT AREA ESTIMATION (AC) 

Two variables, relative crop area (RCA) and normalized average farm size 
(AFS) , by municipality, were considered. 

The reason for using those variables was that the first one represents the 
density of cultivated v;heat ]ands and will provide homogeneous strata with 
respect to the importance of wheat in the agricultural scone. The average farm 
size was selected to represent problems which might be encountered in LANDSAT 
data classification with different field sizes. 

The clustering algorithm gave four groups as the best partition (Table I). 


STRATA 

No. 

MEAN- VALUES 
CRA(I) AFS(Ha) 

STANDARD 

CRA 

DEVIATION- 

APS 

No. OF 

MUNICIPALITIES 

1 

8.5 

22.7 

9.4 

9.9 

177 

2 

4.8 

85.9 

7.6 

22.8 

34 

3 

4,2 

192.8 

6.8 

28.2 

12 

4 

1.2 

279.1 

1.4 

24.5 

9 


Table I. Four strata for CRA and AFS 

The columns of the two mean values in Table I show that there exists a 
negative relationship between the variables CRA and AFS, that is, municipal- 
ities with large average farm size dedicate a large percentage of the total 
land for livestock-raising instead of wheat. 

Figure 1 shows the geographical location of the strata. 

In order to compare with results showed in Table I, the state was 
subdivided into four regions with variable CRA alone (Table II). 


STRATA 

No. 

MEAN 

S.D. 

No. OF 

MUNICIPALITIES 

1 

35.6 

4.1 

10 

2 

21.2 

3.4 

26 

3 

10.1 

2.6 

55 

4 

1 .8 

1 .9 

141 


Table II. Four strata for variable CRA 
Figure 2 shows the geographical location of these stratas. 
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4.2 WHEAT YIELD ESTIMATION CPD 

The original da|a set used in this stratification included the last 
fourteen variables listed in item 3. 

The data were those related with the wheat growing season and some of 
them were average values of historical series. 

Those data were treated with a Principal Component Analysis (Cooley and 
Lohnes, 1971) and after that the scores factor for the first five principal 
components, which account for 93.51 of the total variance, feeded the cluster- 
ing algorithm. 

The >echnique of principal components have been described in the book 
cited. If should be recalled that the method produces uncorrelatod linear 
functions of the original variables without any loss of information. 

Depending upon tJ.e data being used it may be possible to recognize the physical 
significance of these new variables. When this is possible, this method brings 
about a reduction in the quantity of basic data that needs to be used. 

Table III shows the factor eigenvalues and the cumulated percent of the 
.trace. .• 


FACTOR 

EIGENVALJE 

CUMULATED PERCENT 
• OF THE TRACE 

1 

7.69 

54.9 

2 

2.05 

69.6 

3 

1.73 

81.9 

4 

1.01 

89.2 

S 

0.60 

93.5 

OTHER 9 

FACTORS 

6.5 


Table III. Output of the Principal Components Analysis. 

First five principal factors 

‘ With the factor score coefficients, which are an output of the principal 
components analysis, the factor scores for each municipality were calculated, 
and these data feeded the clustering algorithms. 

Table IV shows the number of municipalities for each strata. 


STRATA 

No. OF 

No. 

MUNICIPALITIES 

•' 1 

57 

2 

72 

3 

74 

4 

29 


Table IV. Stratas for five principal comp.onents with 
fourteen agro-meteorological yariables 

Figure 3 shows the geographical position of the strata.. 
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Tho more relevant cJiaracteristics for each strata con be summari?ed as 
follows; 

Strata I . 

This region is called "campanha" and corresponds to an extensive natural 
grassland zone traditionally used for livestock grazing. 

Strata II . 

This region is called "depressao Central" and corresponds to a small- 
scale diversified agriculture. 

Strata III . 

This region is called "planalto medio". It is the soybean-wheat cropping 
region in the state. The wheat growing season is tlje winter and this 
stratum corresponds to the largest yields of the state. 

Strata IV . 

In this region wheat is not cultivated because the urban and industrial 
zone of the great Porto Alegre, the state capital, is in it. Another 
region in this strata is tho atlantic litoral where there are numerous 
areas of sand dunes. The others two subregions of this strata are in the 
livestock raising zone. 

5. CONCLUSIONS 

A stratification oriented to crop area and yield estimation problems was 
performed. 

The algorithm of clustering used produced good results inasmuch as the 
geographic location of the strata appears to be logical and the strata seem 
to represent different conditions. Besides that, the witliin strata' sum of 
squares was minimizes when a sot of agro-meteorological variables was 
simultaneously considered. 

The region chosen to apply the procedure has been extensively studied and 
consequently there exists the possibility of validating tho criterium used. In 
order to improve the final results further work has to be done. 
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Figure 2. Four strata for CRA (%) 








